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Protein evolution by molecular breeding 

Jeremy Minshull* and Willem PC Stemmert 



Natural evolution has guided the development of 'molecular 
breeding' processes used in the laboratory for the rapid 
modification of subgenomic sequences including single 
genes. The most significant recent development has been 
the in vitro permutation of natural diversity. Homologous 
recombination of multiple related sequences produced high- 
quality libraries of chimeric sequences encoding proteins 
with functions that differ dramatically from any of the 
parents. Increasingly powerful screening methods are also 
being developed, allowing these libraries to be screened for 
novel biocatalysts. 
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Introduction 

Enzymes are used in a wide variety of applications 
including food and feed processing, laundry detergents, 
chemicals production, paper bleaching and pharmaceuti- 
cal manufacturing. The benefits of using enzymes as 
catalysts are that reactions can occur at moderate tem- 
peratures, toxic solvents or reactants can often be 
eliminated, and reactions are usually stereespecific, 
which is of particular benefit in the synthesis of pharma- 
ceuticals and fine chemicals. The specificity of enzymes 
also obviates the need for protecting and deprotecting 
reactive groups, which is a source of considerable yield 
loss in organic syntheses. 

Although three billion years of evolution have produced a 
wealth of protein catalysts, they are generally not optimal 
for a particular industrial application. While it is possible to 
screen enzymes from extremophiles for activity under the 
appropriate process reaction conditions [1,2], natural selec- 
tion has selected enzymes to function in the complex 
mixtures of molecules within cells rather than in bioreac- 
tors. Obtaining the desired combinations of properties 
therefore generally requires further protein optimization. 

Structural information has been used with some success to 
improve enzyme function [3-5]. As a general method, 
however, structure-based methods require time and 
equipment in order to generate and process very large 
amounts of information. 

An alternative strategy to making defined changes on the 
basis of structural understanding is to harness the 



Darwinian power of recursive cycles of mutation and 
selection. By using directed evolution, protein engineers 
attempt to mimic the natural processes by which protein 
variants arise and are tested for 'fitness' within living sys- 
tems. In this review, we will focus on the underlying 
rationale behind and recent advances in directed evolu- 
tion, both in the methods used to generate protein 
variants, and in the screening strategies used to identify 
variants of interest. 

DNA shuffling 

Directed evolution effectively performs the complex com- 
putations required to determine the effects of changes in 
sequence on catalytic function. In addition to the active- 
site geometry, the impact of sequence changes on protein 
expression, stability and . folding, and interactions with 
other host proteins and small molecules arc all simultane- 
ously considered simply by directly measuring the activity 
of the mutant enzymes or metabolic pathways. 

The best evolutionary strategies are likely to be those that 
most closely mimic natural ones: in three billion years, not 
only have individual genes evolved, but the evolutionary 
process itself has been optimized [6]. Those algorithms 
that are best at searching through the possible combina- 
tions of nucleotides for sequences with biological function 
have been preserved along with the sequences whose evo- 
lution they have facilitated. Recombination is such a 
mechanism, found universally in biological systems. 
Genetic algorithms and other computer simulations of sim- 
ple evolving systems that incorporate the ability to 
recombine information are more powerful and evolve more 
rapidly than those which do not [6-9]. 

Incorporation of recombination into a method for direct- 
ed evolution of single genes (known as 4 DNA shuffling' 
or 'molecular breeding') was developed recently [10], In 
this method, a population of mutant genes (rather than 
just one) are selected on the basis of their containing 
beneficial mutations, thus making them appropriate as 
parents for the next cycle. The genes are randomly frag- 
mented, then reassembled by recombination with each 
other. The process is shown schematically in Figure 1. 
As well as accelerating the in vitro evolutionary process 
[10-12], the shuffling reaction is extremely flexible: 
many different pieces of genetic information may be 
included if they are available (see Figure 2; [13]). For 
example, Liu tt a/. [14] included degenerate oligonu- 
cleotides in their shuffling reaction in order to randomize 
amino acids believed, through structural studies, to be 
important for the substrate specificity of a tRNA syn- 
thase. Interestingly, only one of the five targeted 
residues was mutated in the enzyme .showing highest 
activity against the new substrate. 



Figure 1 

in vitro recombination by DMA shuffling. Genes 
are fragmented and then reassembled by a 
reaction in which homologous fragments act as 
prim era for each other. 



Starting genes 



Gene fragments 



Shuffled gene 



Many examples of successful directed evolution using 
DNA shuffling have been reviewed recently [15 # ,16*]. 
Last year, several additional formats were described for the 
in vitro [17,18] or in vivo [19] shuffling of genes. While 
these methods have not been thoroughly compared, they 
rely on the same underlying principle that the most effi- 
cient way to explore all of the possible combinations and 
permutations of sequences (i.e. sequence space) is by 
recombination of active variants. 

Screening and selection 

Natural evolution measures the fitness of variants by 
their ability to survive. In some cases, there are genetic 
selections that can be employed to make a cell's growth 
dependent on a particular improved function. Schellen- 
berger's group [20*] recently selected for increased 
subtilisin production by making a target protein the sole 
source of nitrogen, performing the growth in hollow 
fibres to prevent cross-feeding. As an artificial selection 
system, phage display has been used to identify proteins 
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that bind specific ligands. Catalytic proteins displayed on 
phage have also been selected, either by making infec- 
tivity dependent on formation of a covalent intermediate 
[2r*J, or by requiring enzyme activity to release the 
phage from a solid matrix [22 m ]. Both of these methods 
only require a single catalytic event, so are unsuitable for 
quantitative measurements. 

Directed evolution has been used to enhance lipase enan- 
tioselectivity. Lipases accept a wide variety of non-natural 
esters, so lipases that are able to discriminate between 
stereoisomers allow the production of optically pure com- 
pounds useful in pharmaceutical and fine chemical 
manufacture. One group used a microtitre-based 
absorbance assay in which the esterase activity of lipase 
variants was measured against the R and S forms of p- 
nitrophenyl 2-mcthyldccanoate. Four cycles, testing 1,000 
lipase mutants per cycle, increased the enantiosclectivity 
from 2% enantiomeric excess (ee) to 81% ee in favor of 
the *y configuration [23]. A second group evolved an 
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Figure 2 
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The shuffling reaction is extremely flexible. 
Positive variants resulting from random 
mutation and selection can be recombined 
with sequence information obtained 
computationally. Genomics allow the 
inclusion of related genes from other species 
and structural information can be used to 
design synthetic oligonucleotides for making 
specific changes or to randomize targeted 
regions of a protein. 



enzyme to hydrolyze an ester for production of an inter- 
mediate in epithilone synthesis. The initial screen for this 
enzyme was performed by including both the enzyme 
substrate and a pH indicator in agar plates. Bacteria] 
colonies expressing an enzyme able to hydrolyze the ester 
were identified by a change in the colour of the indicator, 
since acid is released when esters are hydrolyzed. 
Colonies selected by this screen were then picked and 
tested for their biotransformation activity and stereoselec- 
tivity by measuring the optical rotation of the products 
[24 # |. While individual screens will always depend on the 
reactions being catalyzed, this strategy of tiered screening 
in which a primary, relatively inaccurate assay is used to 
select a small number of clones that are then subjected to 
more detailed analysis (see Figure 3) is an extremely pow- 
erful general technique. 

It is also possible to perform an entire selection in vitro. As 
an example, a library of genes was transcribed and trans- 
lated in compartments formed in a water/oil emulsion. 
Active DNA methyl transferase Hae\\\ enzymes methylat- 
ed the genes that encoded them, thereby protecting the 
DNA from subsequent Hae\\\ digestion [25"]. By using 
such a system, cloning or transformation of the library is 
not required, so much larger libraries can be screened. Fur- 
ther advances such as coupled reactions leading to gene 
modification and sorting of intact compartments based on 
fluorescence would help make in vitro enzyme production 
and testing a very powerful methodology. 

Using natural diversity 

In addition to developing screening strategics that allow 
greater numbers of mutants to be screened, directed evo- 
lution can be optimized by building protein libraries that 
contain the maximum number of active (and different) 
members. Until this year, single genes were used as start- 
ing points for DNA shuffling and variants, arising by 
point mutation, were very similar in sequence to the par- 
ent gene. Another approach uses principles similar to 
those of the mammalian immune system. Antibodies 
capable of binding essentially any epitope with 



nanomolar association constants are generated by recom- 
binination between a few thousand sequences, followed 
by *affinity maturation* by point mutation [261. Enzyme 
catalysis results from binding to and stabilizing the rele- 
vant transition-state analogue [27], so it should be 
possible to harness such a system to produce enzymes 
[28]. Antibodies have evolved as rigid binding molecules, 
however, and catalytic antibodies are selected solely by 
their abilities to bind transition-state analogues rather 
than other enzymatically essential functions such as sub- 
strate binding and product release. They are thus 
generally much less active as catalysts than proteins that 
have evolved as enzymes. 

Instead of trying to turn antibodies into catalysts, DNA 
shuffling can be used to mimic the immune system's 
incredibly powerful diversity-generating process, by 
recombining genes with one another. In the first exam- 
ple of *DNA family shuffling', four different ^-lactamase 
genes were shuffled together to produce a chimera with 
270-fold greater resistance to moxalactam than the best 
parental enzyme [29*]. The chimeric enzyme produced 
in this experiment differed from each parent by at least 
100 amino acids (Figure 4), yet was still a fully function- 
al cephalosporinase. Like antibody 'diversity' regions, 
sequences that occur in naturally existing enzymes have 

already been tested for their ability to function within 

the context of the protein's overall structure. Recombin- 
ing natural blocks of sequence with each other allows a 
broad region of functional sequence space to be sam- 
pled sparsely. 

Protein chimeras may differ dramatically from 
all their parents 

Where an active site lies at the interface between folding 
subdomains, exchanging these subdomains will alter the 
shape of the active site. For example, swapping domains 
between coagulation factor X and trypsin produced a ser- 
ine protease with broadened substrate selectivity [30*]. 
The activities of chimeric enzymes are often not pre- 
dictable simply by comparing those of the parent enzymes, 
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Figure 3 



Tiered screening. Variants are tested by a 
series of assays that are successively more 
accurate and more time- and labour-intensive. 
It is important to ensure that the higher 
capacity assays correlate weO with the 
desired final activity. FACS, fluorescence- 
activated cell sorting; HPLC, high-pressure 
Squid chromatography. 
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as was found for chimeras between two human blood 
group glycosyl transferases that were shown to be func- 
tionally interconvertible by changing only four amino 
acids. Parental enzyme A transfers ^-acetylgalactosamine 
to a disaccharide acceptor, whereas enzyme B transfers 
galactose. Replacement of Argl76 in enzyme A with the 
Glyl76 of enzyme B resulted not in increases in B-like 
activities, but in a fourfold higher 4 M JX M for the enzyme A 
substrate (i.e. iV-acctylgalactosamine) [31**]. 

Altered substrate specificities have also been produced 
by random recombination of sequences followed by 
screening. Biphenyl dioxygenases initiate the degrada- 
tion of polychlorinated biphenyls, and their congener 
substrate specificities arc determined by the large termi- 
nal subunit [32]. DNA shuffling of two such dioxygenases 
produced chimeras with a different substrate range from 



either parent, enhanced degradation of biphenyl com- 
pounds and even novel oxygenation activity for single 
aromatic hydrocarbons [33,34**]. 

Random chimeras have also been made in vivo between 
two staphylococcal lipases with differing chain-length 
selcctivitics and phospholipase activities. Novel 
enzymes were found that possessed both combinations 
of and absolute levels of these activities that differed 
from both parents in "ways that .were often surprising 
[35*]. For example, one chimera in which a block com- 
prising 20% of the enzyme with no chain-length 
selectivity was incorporated into the enzyme with a 
strong preference for short-chain fatty acids unexpected- 
ly resulted in an enzyme with twofold increased activity 
(relative to the best parent) against the long-chain ester 
p-nitrophenyl palmitatc. 
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Figure 4 
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Mutational distances of chimeric [Hactamase 
with 270-fold improved moxoiactamase 
activity from its four parents. Distances from 
each parent are given in tiurrtber of amtno 
acids, and in the percentage of residues that 
this represents. The chimera differs by 1 02 
amino acids, that is 27% of positions, from its 
closest parent (the Citrobacter enzyme). It 
would not be possible to make 102 random 
changes without inactivating the enzyme. Thus 
recombination of natural diversity allows 
functional sequence space to be sampled 
much more broadly and sparsely than 
sequential point mutations from a single 
starting sequence. 



Recursive cycles of shuffling using multiple parents has 
been performed by Christians eta! [36" J. By recombining 
two Herpes Simplex Virus thymidine kinase genes and 
robotically screening for variants that were better able to 
phosphorylate the therapeutic nucleotide analogue AZT, 
the concentration of AZT required to inhibit cell growth 
was reduced 32-fold relative to that required with the best 
parent. The resulting enzyme was a chimera that had 
undergone ten cross-over events between the two parental 
genes, and had also accumulated five point mutations, 
leading to a protein differing by 22 amino acids from the 
closest parent. The process of recombination between dif- 
ferent but functional parents to make large changes in 
sequence, coupled with point mutagenesis to fine-tune the 
activity of the protein, is highly analogous to the process of 
antibody generation and maturation. 

Directed searches for novel protein activities 

Although it is possible to modify the physical properties 
of an enzyme, such as thermostability or activity in 

organic solvent, by screening for sequential improve- 
ments in these properties [37-39], modification of one 
property by single point mutations can often compro- 
mise another desired characteristic [40 # ]. From the 
results discussed above, we would predict that by recom- 
bining sequences found in nature, it should be possible 
to discover enzymes possessing all combinations of prop- 
erties of the. individual parents, as well as improvements 
over any. of the parents. . 

The classification of enzymes into supcrfamilies that 
appear to be related by a common chemical strategy for 
stabilizing the transition state for the formation of a reac- 
tive intermediate suggests a mechanism by which nature 
may evolve novel catalytic functions [41]. Is it possible to 
make such changes in the laboratory? It may not be pos- 
sible to make a graded change from one reaction to 
another. By making structural comparisons between an 
oleate desaturase and an oleate hydroxylase, Broun et aL 



[42 ,# ] have shown that four amino acid changes in the 
desaturase can convert it to a hydroxylase and changing 
six residues in the hydroxylase result in desaturase activ- 
ity. Making these changes by sequential point 
mutagenesis would not be possible because the single or 
double mutants do not possess intermediate activities. 
The exchange of blocks of amino acids made possible by 
family shuffling, however, offers a possible route to com- 
pletely novel substrate specificities. Enzyme libraries 
constructed from relatively small families of homologous 
genes are likely to contain not only a range of substrate 
specificities, but also a variety of physical properties and 
even new catalytic activities. These libraries can then 
serve as sources of diversity themselves, providing the 
starting points for further directed evolution in many dif- 
ferent directions. 

Conclusions 

By copying the natural mechanisms by which even exist- 
ing diversity can be recombined, DNA shuffling can be 

used to generate high-quality libraries of novel proteins. 
Chimeras between naturally occurring enzymes that dif- 
fer by only a few amino acids often possess activities that 
are significantly different from their parents. By screen- 
ing these libraries using innovative high-throughput 
assay techniques, it is possible to identify enzymes with 
new catalytic functions and physical properties. 
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